Overview
Brought to you by YData
Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 14776615 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.7 GiB |
| Average record size in memory | 271.0 B |
Variable types
| Text | 2 |
|---|---|
| Categorical | 3 |
| DateTime | 1 |
| Numeric | 5 |
fare_amount is highly overall correlated with mta_tax and 1 other fields | High correlation |
mta_tax is highly overall correlated with fare_amount and 1 other fields | High correlation |
tolls_amount is highly overall correlated with mta_tax | High correlation |
total_amount is highly overall correlated with fare_amount | High correlation |
payment_type is highly imbalanced (55.6%) | Imbalance |
mta_tax is highly imbalanced (96.9%) | Imbalance |
surcharge has 7596039 (51.4%) zeros | Zeros |
tip_amount has 7236560 (49.0%) zeros | Zeros |
tolls_amount has 14189538 (96.0%) zeros | Zeros |
Reproduction
| Analysis started | 2025-10-28 01:17:10.760873 |
|---|---|
| Analysis finished | 2025-10-28 01:20:05.551553 |
| Duration | 2 minutes and 54.79 seconds |
| Software version | ydata-profiling vv4.17.0 |
| Download configuration | config.json |
Variables
medallion
Text
| Distinct | 13426 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 GiB |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 32 |
| Min length | 32 |
Unique
| Unique | 33 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 89D227B655E5C82AECF13C3F540D4CF4 |
|---|---|
| 2nd row | 0BD7C8F5BA12B88E0B67BED28BEA73D8 |
| 3rd row | 0BD7C8F5BA12B88E0B67BED28BEA73D8 |
| 4th row | DFD2202EE08F7A8DC9A57B02ACB81FE2 |
| 5th row | DFD2202EE08F7A8DC9A57B02ACB81FE2 |
| Value | Count | Frequency (%) |
| 7e1346f23960cc18d7d129fa28b63a75 | 2137 | < 0.1% |
| 6ffcf7a4f34ba44239636028e680e438 | 2112 | < 0.1% |
| a979cda04cfb8ba3d3acba7e8d7f0661 | 2039 | < 0.1% |
| d5c7cd37ea4d372d00f0a681cdc93f11 | 1959 | < 0.1% |
| 849e486825860106403fb991a763bcc3 | 1957 | < 0.1% |
| 6fe6dff9a59c0b64be0ca64ee2699f08 | 1941 | < 0.1% |
| 06c961ebe7ef4d13f3ae6c005ee0f501 | 1893 | < 0.1% |
| 22908753e00888cc219c875c8d5bc4f6 | 1886 | < 0.1% |
| e6101a0f85312c49a5b5950e61d284dc | 1882 | < 0.1% |
| 6403bf98e4618e21c795c3b45a636d77 | 1882 | < 0.1% |
| Other values (13416) | 14756927 |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
hack_license
Text
| Distinct | 32224 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 GiB |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 32 |
| Min length | 32 |
Unique
| Unique | 182 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | BA96DE419E711691B9445D6A6307C170 |
|---|---|
| 2nd row | 9FD8F69F0804BDB5549F40E9DA1BE472 |
| 3rd row | 9FD8F69F0804BDB5549F40E9DA1BE472 |
| 4th row | 51EE87E3205C985EF8431D850C786310 |
| 5th row | 51EE87E3205C985EF8431D850C786310 |
| Value | Count | Frequency (%) |
| 00b7691d86d96aebd21dd9e138f90840 | 1933 | < 0.1% |
| f49fd0d84449ae7f72f3bc492cd6c754 | 1616 | < 0.1% |
| 51c1be97280a80ebfa8dad34e1956cf6 | 1603 | < 0.1% |
| 847349f8845a667d9ac7cdedd1c873cb | 1570 | < 0.1% |
| ce625fd96d0fafc812a6957139b354a1 | 1557 | < 0.1% |
| 3d757e111c78f5cac83d44a92885d490 | 1514 | < 0.1% |
| 22ca618759c716436ea3257480199a32 | 1501 | < 0.1% |
| 3aab94ca53fe93a64811f65690654649 | 1486 | < 0.1% |
| e66e58207128619cff2d2e2c3c7ecc08 | 1442 | < 0.1% |
| c9674190984ba193ffd8ddcc019804cf | 1390 | < 0.1% |
| Other values (32214) | 14761003 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CMT |
|---|---|
| 2nd row | CMT |
| 3rd row | CMT |
| 4th row | CMT |
| 5th row | CMT |
Common Values
| Value | Count | Frequency (%) |
| CMT | 7450899 | |
| VTS | 7325716 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| cmt | 7450899 | |
| vts | 7325716 |
Most occurring characters
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
pickup_datetime
Date
| Distinct | 2303465 |
|---|---|
| Distinct (%) | 15.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 112.7 MiB |
| Minimum | 2013-01-01 00:00:00 |
|---|---|
| Maximum | 2013-01-31 23:59:59 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
Histogram with fixed size bins (bins=50)
payment_type
Categorical
Imbalance
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.1 MiB |
| CRD | |
|---|---|
| CSH | |
| NOC | 32783 |
| DIS | 11171 |
| UNK | 6434 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CSH |
|---|---|
| 2nd row | CSH |
| 3rd row | CSH |
| 4th row | CSH |
| 5th row | CSH |
Common Values
| Value | Count | Frequency (%) |
| CRD | 7743844 | |
| CSH | 6982383 | |
| NOC | 32783 | 0.2% |
| DIS | 11171 | 0.1% |
| UNK | 6434 | < 0.1% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| crd | 7743844 | |
| csh | 6982383 | |
| noc | 32783 | 0.2% |
| dis | 11171 | 0.1% |
| unk | 6434 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 14759010 | |
| D | 7755015 | |
| R | 7743844 | |
| S | 6993554 | |
| H | 6982383 | |
| N | 39217 | 0.1% |
| O | 32783 | 0.1% |
| I | 11171 | < 0.1% |
| U | 6434 | < 0.1% |
| K | 6434 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 14759010 | |
| D | 7755015 | |
| R | 7743844 | |
| S | 6993554 | |
| H | 6982383 | |
| N | 39217 | 0.1% |
| O | 32783 | 0.1% |
| I | 11171 | < 0.1% |
| U | 6434 | < 0.1% |
| K | 6434 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 14759010 | |
| D | 7755015 | |
| R | 7743844 | |
| S | 6993554 | |
| H | 6982383 | |
| N | 39217 | 0.1% |
| O | 32783 | 0.1% |
| I | 11171 | < 0.1% |
| U | 6434 | < 0.1% |
| K | 6434 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 14759010 | |
| D | 7755015 | |
| R | 7743844 | |
| S | 6993554 | |
| H | 6982383 | |
| N | 39217 | 0.1% |
| O | 32783 | 0.1% |
| I | 11171 | < 0.1% |
| U | 6434 | < 0.1% |
| K | 6434 | < 0.1% |
fare_amount
Real number (ℝ)
High correlation
| Distinct | 1417 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.664722 |
| Minimum | 2.5 |
|---|---|
| Maximum | 500 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 2.5 |
|---|---|
| 5-th percentile | 4.5 |
| Q1 | 6.5 |
| median | 9 |
| Q3 | 13 |
| 95-th percentile | 30 |
| Maximum | 500 |
| Range | 497.5 |
| Interquartile range (IQR) | 6.5 |
Descriptive statistics
| Standard deviation | 9.6392187 |
|---|---|
| Coefficient of variation (CV) | 0.82635649 |
| Kurtosis | 48.307291 |
| Mean | 11.664722 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 4.0728317 |
| Sum | 1.7236511 × 108 |
| Variance | 92.914538 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6 | 813197 | 5.5% |
| 6.5 | 806688 | 5.5% |
| 5.5 | 793987 | 5.4% |
| 7 | 782107 | 5.3% |
| 7.5 | 748123 | 5.1% |
| 5 | 725076 | 4.9% |
| 8 | 702986 | 4.8% |
| 8.5 | 655601 | 4.4% |
| 9 | 604185 | 4.1% |
| 4.5 | 592918 | 4.0% |
| Other values (1407) | 7551747 |
| Value | Count | Frequency (%) |
| 2.5 | 61249 | |
| 2.55 | 1 | < 0.1% |
| 2.6 | 2 | < 0.1% |
| 2.69 | 1 | < 0.1% |
| 2.7 | 6 | < 0.1% |
| 2.75 | 1 | < 0.1% |
| 2.8 | 4 | < 0.1% |
| 2.82 | 3 | < 0.1% |
| 2.83 | 3 | < 0.1% |
| 2.85 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 500 | 10 | |
| 479.4 | 1 | < 0.1% |
| 476.66 | 1 | < 0.1% |
| 475 | 4 | < 0.1% |
| 470 | 7 | |
| 468.5 | 1 | < 0.1% |
| 465 | 1 | < 0.1% |
| 460 | 1 | < 0.1% |
| 450.01 | 1 | < 0.1% |
| 450 | 10 |
surcharge
Real number (ℝ)
Zeros
| Distinct | 23 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.32049042 |
| Minimum | 0 |
|---|---|
| Maximum | 12.5 |
| Zeros | 7596039 |
| Zeros (%) | 51.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.5 |
| 95-th percentile | 1 |
| Maximum | 12.5 |
| Range | 12.5 |
| Interquartile range (IQR) | 0.5 |
Descriptive statistics
| Standard deviation | 0.36757414 |
|---|---|
| Coefficient of variation (CV) | 1.1469115 |
| Kurtosis | -0.60733 |
| Mean | 0.32049042 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.68750232 |
| Sum | 4735763.5 |
| Variance | 0.13511075 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 0 | 7596039 | |
| 0.5 | 4890084 | |
| 1 | 2290231 | 15.5% |
| 1.5 | 190 | < 0.1% |
| 2 | 26 | < 0.1% |
| 2.5 | 13 | < 0.1% |
| 3 | 7 | < 0.1% |
| 0.41 | 4 | < 0.1% |
| 0.82 | 3 | < 0.1% |
| 3.5 | 3 | < 0.1% |
| Other values (13) | 15 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 7596039 | |
| 0.02 | 1 | < 0.1% |
| 0.05 | 1 | < 0.1% |
| 0.08 | 1 | < 0.1% |
| 0.1 | 1 | < 0.1% |
| 0.15 | 1 | < 0.1% |
| 0.41 | 4 | < 0.1% |
| 0.5 | 4890084 | |
| 0.82 | 3 | < 0.1% |
| 1 | 2290231 | 15.5% |
| Value | Count | Frequency (%) |
| 12.5 | 2 | < 0.1% |
| 10 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 8 | 2 | < 0.1% |
| 7.5 | 1 | < 0.1% |
| 7 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 3.5 | 3 | |
| 3 | 7 |
mta_tax
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 732.8 MiB |
| 0.5 | |
|---|---|
| 0.0 | 47374 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 14729241 | |
| 0.0 | 47374 | 0.3% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 14729241 | |
| 0.0 | 47374 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 14823989 | |
| . | 14776615 | |
| 5 | 14729241 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 14823989 | |
| . | 14776615 | |
| 5 | 14729241 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 14823989 | |
| . | 14776615 | |
| 5 | 14729241 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 14823989 | |
| . | 14776615 | |
| 5 | 14729241 |
tip_amount
Real number (ℝ)
Zeros
| Distinct | 2768 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.2675086 |
| Minimum | 0 |
|---|---|
| Maximum | 200 |
| Zeros | 7236560 |
| Zeros (%) | 49.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.8 |
| Q3 | 2 |
| 95-th percentile | 4.75 |
| Maximum | 200 |
| Range | 200 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 2.0460844 |
|---|---|
| Coefficient of variation (CV) | 1.6142568 |
| Kurtosis | 177.72682 |
| Mean | 1.2675086 |
| Median Absolute Deviation (MAD) | 0.8 |
| Skewness | 6.0812817 |
| Sum | 18729486 |
| Variance | 4.1864613 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 7236560 | |
| 1 | 1342116 | 9.1% |
| 2 | 715709 | 4.8% |
| 1.5 | 595019 | 4.0% |
| 3 | 233031 | 1.6% |
| 2.5 | 192926 | 1.3% |
| 1.8 | 183703 | 1.2% |
| 1.4 | 177746 | 1.2% |
| 1.2 | 174359 | 1.2% |
| 1.6 | 174284 | 1.2% |
| Other values (2758) | 3751162 |
| Value | Count | Frequency (%) |
| 0 | 7236560 | |
| 0.01 | 3403 | < 0.1% |
| 0.02 | 1079 | < 0.1% |
| 0.03 | 428 | < 0.1% |
| 0.04 | 173 | < 0.1% |
| 0.05 | 890 | < 0.1% |
| 0.06 | 194 | < 0.1% |
| 0.07 | 182 | < 0.1% |
| 0.08 | 776 | < 0.1% |
| 0.09 | 283 | < 0.1% |
| Value | Count | Frequency (%) |
| 200 | 4 | |
| 197 | 1 | < 0.1% |
| 187.75 | 1 | < 0.1% |
| 182.45 | 1 | < 0.1% |
| 180.3 | 1 | < 0.1% |
| 177 | 1 | < 0.1% |
| 166 | 1 | < 0.1% |
| 165 | 1 | < 0.1% |
| 161 | 1 | < 0.1% |
| 160 | 1 | < 0.1% |
tolls_amount
Real number (ℝ)
High correlation Zeros
| Distinct | 502 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.20186698 |
| Minimum | 0 |
|---|---|
| Maximum | 20 |
| Zeros | 14189538 |
| Zeros (%) | 96.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.0354807 |
|---|---|
| Coefficient of variation (CV) | 5.1295198 |
| Kurtosis | 43.02759 |
| Mean | 0.20186698 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.8594753 |
| Sum | 2982910.6 |
| Variance | 1.0722202 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 14189538 | |
| 4.8 | 547869 | 3.7% |
| 10.25 | 11614 | 0.1% |
| 2.2 | 5582 | < 0.1% |
| 8.25 | 4832 | < 0.1% |
| 9.6 | 3296 | < 0.1% |
| 6.5 | 1111 | < 0.1% |
| 14.4 | 895 | < 0.1% |
| 15.05 | 878 | < 0.1% |
| 5 | 458 | < 0.1% |
| Other values (492) | 10542 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 14189538 | |
| 0.01 | 175 | < 0.1% |
| 0.02 | 16 | < 0.1% |
| 0.03 | 8 | < 0.1% |
| 0.04 | 39 | < 0.1% |
| 0.05 | 10 | < 0.1% |
| 0.06 | 11 | < 0.1% |
| 0.08 | 7 | < 0.1% |
| 0.09 | 8 | < 0.1% |
| 0.1 | 9 | < 0.1% |
| Value | Count | Frequency (%) |
| 20 | 42 | |
| 19.85 | 81 | |
| 19.8 | 4 | < 0.1% |
| 19.75 | 4 | < 0.1% |
| 19.65 | 2 | < 0.1% |
| 19.55 | 1 | < 0.1% |
| 19.5 | 11 | < 0.1% |
| 19.45 | 2 | < 0.1% |
| 19.4 | 1 | < 0.1% |
| 19.25 | 2 | < 0.1% |
total_amount
Real number (ℝ)
High correlation
| Distinct | 8695 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.952985 |
| Minimum | 2.5 |
|---|---|
| Maximum | 650 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 2.5 |
|---|---|
| 5-th percentile | 5.4 |
| Q1 | 7.7 |
| median | 10.5 |
| Q3 | 15.5 |
| 95-th percentile | 36.8 |
| Maximum | 650 |
| Range | 647.5 |
| Interquartile range (IQR) | 7.8 |
Descriptive statistics
| Standard deviation | 11.464686 |
|---|---|
| Coefficient of variation (CV) | 0.82166547 |
| Kurtosis | 36.258167 |
| Mean | 13.952985 |
| Median Absolute Deviation (MAD) | 3.5 |
| Skewness | 3.8875555 |
| Sum | 2.0617789 × 108 |
| Variance | 131.43902 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.5 | 564079 | 3.8% |
| 9 | 555271 | 3.8% |
| 8 | 546099 | 3.7% |
| 7 | 544153 | 3.7% |
| 7.5 | 531046 | 3.6% |
| 6 | 515046 | 3.5% |
| 9.5 | 497237 | 3.4% |
| 8.5 | 480776 | 3.3% |
| 10 | 432154 | 2.9% |
| 5.5 | 405590 | 2.7% |
| Other values (8685) | 9705164 |
| Value | Count | Frequency (%) |
| 2.5 | 52 | < 0.1% |
| 2.55 | 1 | < 0.1% |
| 2.6 | 2 | < 0.1% |
| 2.69 | 1 | < 0.1% |
| 2.7 | 1 | < 0.1% |
| 2.75 | 1 | < 0.1% |
| 2.8 | 1 | < 0.1% |
| 2.9 | 5 | < 0.1% |
| 3 | 27854 | |
| 3.01 | 53 | < 0.1% |
| Value | Count | Frequency (%) |
| 650 | 2 | < 0.1% |
| 508.25 | 1 | < 0.1% |
| 500.5 | 1 | < 0.1% |
| 500 | 7 | |
| 494 | 1 | < 0.1% |
| 479.4 | 1 | < 0.1% |
| 476.66 | 1 | < 0.1% |
| 475 | 4 | |
| 470.5 | 7 | |
| 469.5 | 1 | < 0.1% |
Interactions
Correlations
| fare_amount | mta_tax | payment_type | surcharge | tip_amount | tolls_amount | total_amount | vendor_id | |
|---|---|---|---|---|---|---|---|---|
| fare_amount | 1.000 | 0.561 | 0.010 | -0.004 | 0.320 | 0.319 | 0.975 | 0.004 |
| mta_tax | 0.561 | 1.000 | 0.035 | 0.008 | 0.146 | 0.564 | 0.371 | 0.007 |
| payment_type | 0.010 | 0.035 | 1.000 | 0.008 | 0.009 | 0.029 | 0.036 | 0.059 |
| surcharge | -0.004 | 0.008 | 0.008 | 1.000 | 0.030 | -0.076 | 0.079 | 0.004 |
| tip_amount | 0.320 | 0.146 | 0.009 | 0.030 | 1.000 | 0.153 | 0.472 | 0.003 |
| tolls_amount | 0.319 | 0.564 | 0.029 | -0.076 | 0.153 | 1.000 | 0.327 | 0.022 |
| total_amount | 0.975 | 0.371 | 0.036 | 0.079 | 0.472 | 0.327 | 1.000 | 0.007 |
| vendor_id | 0.004 | 0.007 | 0.059 | 0.004 | 0.003 | 0.022 | 0.007 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| medallion | hack_license | vendor_id | pickup_datetime | payment_type | fare_amount | surcharge | mta_tax | tip_amount | tolls_amount | total_amount | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 89D227B655E5C82AECF13C3F540D4CF4 | BA96DE419E711691B9445D6A6307C170 | CMT | 2013-01-01 15:11:48 | CSH | 6.5 | 0.0 | 0.5 | 0.0 | 0.0 | 7.0 |
| 1 | 0BD7C8F5BA12B88E0B67BED28BEA73D8 | 9FD8F69F0804BDB5549F40E9DA1BE472 | CMT | 2013-01-06 00:18:35 | CSH | 6.0 | 0.5 | 0.5 | 0.0 | 0.0 | 7.0 |
| 2 | 0BD7C8F5BA12B88E0B67BED28BEA73D8 | 9FD8F69F0804BDB5549F40E9DA1BE472 | CMT | 2013-01-05 18:49:41 | CSH | 5.5 | 1.0 | 0.5 | 0.0 | 0.0 | 7.0 |
| 3 | DFD2202EE08F7A8DC9A57B02ACB81FE2 | 51EE87E3205C985EF8431D850C786310 | CMT | 2013-01-07 23:54:15 | CSH | 5.0 | 0.5 | 0.5 | 0.0 | 0.0 | 6.0 |
| 4 | DFD2202EE08F7A8DC9A57B02ACB81FE2 | 51EE87E3205C985EF8431D850C786310 | CMT | 2013-01-07 23:25:03 | CSH | 9.5 | 0.5 | 0.5 | 0.0 | 0.0 | 10.5 |
| 5 | 20D9ECB2CA0767CF7A01564DF2844A3E | 598CCE5B9C1918568DEE71F43CF26CD2 | CMT | 2013-01-07 15:27:48 | CSH | 9.5 | 0.0 | 0.5 | 0.0 | 0.0 | 10.0 |
| 6 | 496644932DF3932605C22C7926FF0FE0 | 513189AD756FF14FE670D10B92FAF04C | CMT | 2013-01-08 11:01:15 | CSH | 6.0 | 0.0 | 0.5 | 0.0 | 0.0 | 6.5 |
| 7 | 0B57B9633A2FECD3D3B1944AFC7471CF | CCD4367B417ED6634D986F573A552A62 | CMT | 2013-01-07 12:39:18 | CSH | 34.0 | 0.0 | 0.5 | 0.0 | 4.8 | 39.3 |
| 8 | 2C0E91FF20A856C891483ED63589F982 | 1DA2F6543A62B8ED934771661A9D2FA0 | CMT | 2013-01-07 18:15:47 | CSH | 5.5 | 1.0 | 0.5 | 0.0 | 0.0 | 7.0 |
| 9 | 2D4B95E2FA7B2E85118EC5CA4570FA58 | CD2F522EEE1FF5F5A8D8B679E23576B3 | CMT | 2013-01-07 15:33:28 | CSH | 13.0 | 0.0 | 0.5 | 0.0 | 0.0 | 13.5 |
| medallion | hack_license | vendor_id | pickup_datetime | payment_type | fare_amount | surcharge | mta_tax | tip_amount | tolls_amount | total_amount | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 14776605 | A8262FA0AFCB6C7229F6888EAFBDE076 | 1BDF89260FEF1AE6FDDE839A0278D31D | CMT | 2013-01-07 07:29:06 | CSH | 52.0 | 0.0 | 0.5 | 0.0 | 4.8 | 57.3 |
| 14776606 | A8262FA0AFCB6C7229F6888EAFBDE076 | 1BDF89260FEF1AE6FDDE839A0278D31D | CMT | 2013-01-07 14:30:23 | CSH | 9.5 | 0.0 | 0.5 | 0.0 | 0.0 | 10.0 |
| 14776607 | F33EF464441839C6F0DABAABBC93B45D | 313F66DD09C308EADA3B307F6B8CF7A9 | CMT | 2013-01-10 10:56:47 | CSH | 7.5 | 0.0 | 0.5 | 0.0 | 0.0 | 8.0 |
| 14776608 | 56CE01E7DBE0E6449FA1758F082D8884 | 4C6FE2FCFED26629D515D291EC1516A0 | CMT | 2013-01-10 14:50:01 | CSH | 20.0 | 0.0 | 0.5 | 0.0 | 0.0 | 20.5 |
| 14776609 | 32201027CDC62D654DC3AD9747A07C96 | B8DDB9F8143017E22104050B26C2A65D | CMT | 2013-01-05 08:58:18 | CSH | 10.5 | 0.0 | 0.5 | 0.0 | 0.0 | 11.0 |
| 14776610 | B33E71CD9E8FE1BE3B70FEB6E807DD15 | BAF57796E45D921BB23217E17A372FF6 | CMT | 2013-01-06 04:58:23 | CSH | 13.0 | 0.5 | 0.5 | 0.0 | 0.0 | 14.0 |
| 14776611 | ED160B76D5349C8AC1ECF22CD4B8D538 | 3B93F6DA5DEBDE9560993FA624C4FF76 | CMT | 2013-01-08 14:42:04 | CSH | 7.5 | 0.0 | 0.5 | 0.0 | 0.0 | 8.0 |
| 14776612 | D83F9AC0E33F6F19869C243BE6AB6FE5 | 85A55B6772275374EF90AC9457DC1F83 | CMT | 2013-01-10 13:29:23 | CSH | 6.0 | 0.0 | 0.5 | 0.0 | 0.0 | 6.5 |
| 14776613 | 04E59442A7DDBCE515E33CD355D866E7 | 7913172189931A1A1632562B10AB53C4 | CMT | 2013-01-06 16:30:15 | CSH | 9.5 | 0.0 | 0.5 | 0.0 | 0.0 | 10.0 |
| 14776614 | D30BED60331C79E3F7ACD05B325ED42F | B5E1D2461A5BCC8819188DACEC17CD69 | CMT | 2013-01-05 20:38:46 | CSH | 5.0 | 0.5 | 0.5 | 0.0 | 0.0 | 6.0 |